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Options for Securing RTP Sessions 
Abstract 


The Real-time Transport Protocol (RTP) is used in a large number of 
different application domains and environments. This heterogeneity 
implies that different security mechanisms are needed to provide 
services such as confidentiality, integrity, and source 
authentication of RTP and RTP Control Protocol (RTCP) packets 
suitable for the various environments. The range of solutions makes 
it difficult for RTP-based application developers to pick the most 
suitable mechanism. This document provides an overview of a number 
of security solutions for RTP and gives guidance for developers on 
how to choose the appropriate security mechanism. 


Status of This Memo 


This document is not an Internet Standards Track specification; it is 
published for informational purposes. 


This document is a product of the Internet Engineering Task Force 


(IETF). It represents the consensus of the IETF community. It has 
received public review and has been approved for publication by the 
Internet Engineering Steering Group (IESG). Not all documents 


approved by the IESG are a candidate for any level of Internet 
Standard; see Section 2 of RFC 5741. 


Information about the current status of this document, any errata, 


and how to provide feedback on it may be obtained at 
http://www.rfc-editor.org/info/rfc7201. 
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Introduction 


The Real-time Transport Protocol (RTP) [RFC3550] is widely used ina 
large variety of multimedia applications, including Voice over IP 
(VoIP), centralized multimedia conferencing, sensor data transport, 
and Internet television (IPTV) services. These applications can 
range from point-to-point phone calls, through centralized group 
teleconferences, to large-scale television distribution services. 
The types of media can vary significantly, as can the signaling 
methods used to establish the RTP sessions. 


So far, this multidimensional heterogeneity has prevented development 
of a single security solution that meets the needs of the different 
applications. Instead, a significant number of different solutions 
have been developed to meet different sets of security goals. This 
makes it difficult for application developers to know what solutions 
exist and whether their properties are appropriate. This memo gives 
an overview of the available RTP solutions and provides guidance on 
their applicability for different application domains. It also 
attempts to provide an indication of actual and intended usage at the 
time of writing as additional input to help with considerations such 
as interoperability, availability of implementations, etc. The 
guidance provided is not exhaustive, and this memo does not provide 
normative recommendations. 


It is important that application developers consider the security 
goals and requirements for their application. The IETF considers it 
important that protocols implement secure modes of operation and 
makes them available to users [RFC3365]. Because of the 
heterogeneity of RTP applications and use cases, however, a single 
security solution cannot be mandated [RFC7202]. Instead, application 
developers need to select mechanisms that provide appropriate 
security for their environment. It is strongly encouraged that 
common mechanisms be used by related applications in common 
environments. The IETF publishes guidelines for specific classes of 
applications, so it is worth searching for such guidelines. 


The remainder of this document is structured as follows. Section 2 
provides additional background. Section 3 outlines the available 
security mechanisms at the time of this writing and lists their key 
security properties and constraints. Section 4 provides guidelines 
and important aspects to consider when securing an RTP application. 
Finally, in Section 5, we give some examples of application domains 
where guidelines for security exist. 
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2. Background 


RTP can be used in a wide variety of topologies due to its support 
for point-to-point sessions, multicast groups, and other topologies 
built around different types of RTP middleboxes. In the following, 
we review the different topologies supported by RTP to understand 
their implications for the security properties and trust relations 
that can exist in RTP sessions. 


2.1. Point-to-Point Sessions 


The most basic use case is two directly connected endpoints, shown in 
Figure 1, where A has established an RTP session with B. In this 
case, the RTP security is primarily about ensuring that any third 
party be unable to compromise the confidentiality and integrity of 
the media communication. This requires confidentiality protection of 
the RTP session, integrity protection of the RTP/RTCP packets, and 
source authentication of all the packets to ensure no man-in-the- 
middle (MITM) attack is taking place. 


The source authentication can also be tied to a user or an endpoint’s 
verifiable identity to ensure that the peer knows with whom they are 
communicating. Here, the combination of the security protocol 
protecting the RTP session (and, hence, the RTP and RTCP traffic) and 
the key management protocol becomes important to determine what 
security claims can be made. 


Figure 1: Point-to-Point Topology 
2.2. Sessions Using an RTP Mixer 


An RTP mixer is an RTP session-level middlebox around which one can 
build a multiparty RTP-based conference. The RTP mixer might 
actually perform media mixing, like mixing audio or compositing video 
images into a new media stream being sent from the mixer to a given 
participant, or it might provide a conceptual stream; for example, 
the video of the current active speaker. From a security point of 
view, the important features of an RTP mixer are that it generates a 
new media stream, has its own source identifier, and does not simply 
forward the original media. 
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An RTP session using a mixer might have a topology like that in 
Figure 2. In this example, participants A through D each send 
unicast RTP traffic to the RTP mixer, and receive an RTP stream from 
the mixer, comprising a mixture of the streams from the other 


participants. 

+---+ 4+—------------ + +—---+ 
| a [<---> |<---->| B | 
+---+ +---+ 

Mixer 

+---+ | | +---+ 
| c [<---> [<---->| D | 
+---+ 4+—------------ + +---+ 


Figure 2: Example RTP Mixer Topology 


A consequence of an RTP mixer having its own source identifier and 
acting as an active participant towards the other endpoints is that 
the RTP mixer needs to be a trusted device that has access to the 
security context(s) established. The RTP mixer can also become a 
security-enforcing entity. For example, a common approach to secure 
the topology in Figure 2 is to establish a security context between 
the mixer and each participant independently and have the mixer 
source authenticate each peer. The mixer then ensures that one 
participant cannot impersonate another. 


2.3. Sessions Using an RTP Translator 


RTP translators are middleboxes that provide various levels of 
in-network media translation and transcoding. Their security 
properties vary widely, depending on which type of operations they 
attempt to perform. We identify and discuss three different 
categories of RTP translators: transport translators, gateways, and 
media transcoders. 


2.3.1. Transport Translator (Relay) 


A transport translator [RFC5117] operates on a level below RTP and 
RTCP. It relays the RIP/RTICP traffic from one endpoint to one or 


more other addresses. This can be done based only on IP addresses 
and transport protocol ports, and each receive port on the translator 
can have a very basic list of where to forward traffic. Transport 


translators also need to implement ingress filtering to prevent 
random traffic from being forwarded that isn’t coming from a 
participant in the conference. 


Figure 3 shows an example transport translator, where traffic from 
any one of the four participants will be forwarded to the other three 
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participants unchanged. The resulting topology is very similar to an 
Any Source Multicast (ASM) session (as discussed in Section 2.4) but 
is implemented at the application layer. 


+---+ +------------ + +--+ 

eqs sia 

+---+ | Relay | +---+ 
| Translator | 

+---+ +---+ 

| ce |<----> <---->| D | 

+---+ +------------ + +---+ 


Figure 3: RTP Relay Translator Topology 


A transport translator can often operate without needing access to 
the security context, as long as the security mechanism does not 
provide protection over the transport-layer information. A transport 
translator does, however, make the group communication visible and, 
thus, can complicate keying and source authentication mechanisms. 
This is further discussed in Section 2.4. 


2.3.2. Gateway 


Gateways are deployed when the endpoints are not fully compatible. 
Figure 4 shows an example topology. The functions a gateway provides 
can be diverse and range from transport-layer relaying between two 
domains not allowing direct communication, via transport or media 
protocol function initiation or termination, to protocol- or media- 
encoding translation. The supported security protocol might even be 
one of the reasons a gateway is needed. 


+---+ 4+----------- + +---+ 
| A |<---->| Gateway |<---->| B | 
+---+ 4+----------- + +---+ 


Figure 4: RTP Gateway Topology 


The choice of security protocol, and the details of the gateway 
function, will determine if the gateway needs to be trusted with 
access to the application security context. Many gateways need to be 
trusted by all peers to perform the translation; in other cases, some 
or all peers might not be aware of the presence of the gateway. The 
security protocols have different properties depending on the degree 
of trust and visibility needed. Ensuring communication is possible 
without trusting the gateway can be a strong incentive for accepting 
different security properties. Some security solutions will be able 
to detect the gateways as manipulating the media stream, unless the 
gateway is a trusted device. 
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3. Media Transcoder 


A media transcoder is a special type of gateway device that changes 
the encoding of the media being transported by RTP. The discussion 
in Section 2.3.2 applies. A media transcoder alters the media data 
and, thus, needs to be trusted with access to the security context. 


Any Source Multicast 


Any Source Multicast [RFC1112] is the original multicast model where 
any multicast group participant can send to the multicast group and 
get their packets delivered to all group members (see Figure 5). 

This form of communication has interesting security properties due to 
the many-to-many nature of the group. Source authentication is 
important, but all participants with access to the group security 
context will have the necessary secrets to decrypt and verify the 
integrity of the traffic. Thus, use of any group security context 
fails if the goal is to separate individual sources; alternate 
solutions are needed. 


+ Multicast + 
+---+ \ Network / +---+ 


Figure 5: Any Source Multicast (ASM) Group 


In addition, the potential large size of multicast groups creates 
some considerations for the scalability of the solution and how the 
key management is handled. 


Source-Specific Multicast 


Source-Specific Multicast (SSM) [RFC4607] allows only a specific 
endpoint to send traffic to the multicast group, irrespective of the 
number of RTP media sources. The endpoint is known as the media 
distribution source. For the RTP session to function correctly with 
RTCP over an SSM session, extensions have been defined in [RFC5760]. 
Figure 6 shows a sample SSM-based RTP session where several media 
sources, MS1...MSm, all send media to a distribution source, which 
then forwards the media data to the SSM group for delivery to the 
receivers, R1...Rn, and the feedback targets, FT1...FTn. RTCP 
reception quality feedback is sent unicast from each receiver to one 
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of the feedback targets. The feedback targets aggregate reception 
quality feedback and forward it upstream towards the distribution 
source. The distribution source forwards (possibly aggregated and 
summarized) reception feedback to the SSM group and back to the 
original media sources. The feedback targets are also members of the 
SSM group and receive the media data, so they can send unicast repair 
data to the receivers in response to feedback if appropriate. 


+----- + +----- + +----- + 
| msı | | Ms2 | | MSm | 
+----- + +----- + +----- + 

| | | 
V V V 

+--------------------------------- + 

| Distribution Source 

+-------- + 

| FT Agg | | 

+-------- +------------------------ + 

me | 
| 
HF a, atari tanner ale a + 
| 
/ \ 

+------ + / \ +----- + 
FT1 |<----+ +----- >| FT2 | 

+------ + / \ sexe oe + 

^ ^ 74 \ ^ ^ 
/ \ 
/ \ 
/ \ 
/\ /\ 
/. \ Es 
Va FeV Vo eV. 
+----+ +----+ +----+ +----+ 
| R1 | | R2 | ... [Rn-1| | Rn | 
+----+ +----+ +----+ +----+ 


Figure 6: Example SSM-Based RTP Session with Two Feedback Targets 


The use of SSM makes it more difficult to inject traffic into the 
multicast group, but not impossible. Source authentication 
requirements apply for SSM sessions, too; an individual verification 
of who sent the RTP and RTCP packets is needed. An RTP session using 
SSM will have a group security context that includes the media 
sources, distribution source, feedback targets, and the receivers. 
Each has a different role and will be trusted to perform different 
actions. For example, the distribution source will need to 
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authenticate the media sources to prevent unwanted traffic from being 
distributed via the SSM group. Similarly, the receivers need to 
authenticate both the distribution source and their feedback target 
to prevent injection attacks from malicious devices claiming to be 
feedback targets. An understanding of the trust relationships and 
group security context is needed between all components of the 
system. 


3. Security Options 


This section provides an overview of security requirements and the 
current RTP security mechanisms that implement those requirements. 
This cannot be a complete survey, since new security mechanisms are 
defined regularly. The goal is to help applications designers by 
reviewing the types of solutions that are available. This section 
will use a number of different security-related terms, as described 
in the Internet Security Glossary, Version 2 [RFC4949]. 


3.1. Secure RTP 


The Secure Real-time Transport Protocol (SRTP) [RFC3711] is one of 
the most commonly used mechanisms to provide confidentiality, 
integrity protection, source authentication, and replay protection 
for RTP. SRTP was developed with RIP header compression and third- 
party monitors in mind. Thus, the RIP header is not encrypted in RTP 
data packets, and the first 8 bytes of the first RTCP packet header 
in each compound RTCP packet are not encrypted. The entirety of RTP 
packets and compound RTCP packets are integrity protected. This 
allows RTP header compression to work and lets third-party monitors 
determine what RTP traffic flows exist based on the synchronization 
source (SSRC) fields, but it protects the sensitive content. 


SRTP works with transforms where different combinations of encryption 
algorithm, authentication algorithm, and pseudorandom function can be 
used, and the authentication tag length can be set to any value. 

SRTP can also be easily extended with additional cryptographic 
transforms. This gives flexibility but requires more security 
knowledge by the application developer. To simplify things, Session 
Description Protocol (SDP) security descriptions (see Section 3.1.3) 
and Datagram Transport Layer Security Extension for SRTP (DTLS-SRTP) 
(see Section 3.1.1) use predefined combinations of transforms, known 
as SRTP crypto suites and SRTP protection profiles, that bundle 
together transforms and other parameters, making them easier to use 
but reducing flexibility. The Multimedia Internet Keying (MIKEY) 
protocol (see Section 3.1.2) provides flexibility to negotiate the 
full selection of transforms. At the time of this writing, the 
following transforms, SRTP crypto suites, and SRTP protection 
profiles are defined or under definition: 
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AES-CM and HMAC-SHA-1: AES Counter Mode encryption with 128-bit keys 
combined with 160-bit keyed HMAC-SHA-1 with an 80-bit 
authentication tag. This is the default cryptographic transform 
that needs to be supported. The transforms are defined in SRTP 
[RFC3711], with the corresponding SRIP crypto suite defined in 
[RFC4568] and SRTP protection profile defined in [RFC5764]. 


AES-£8 and HMAC-SHA-1: AES f£8-mode encryption using 128-bit keys 
combined with keyed HMAC-SHA-1 using 80-bit authentication. The 
transforms are defined in [RFC3711], with the corresponding SRIP 
crypto suite defined in [RFC4568]. The corresponding SRIP 
protection profile is not defined. 


SEED: A Korean national standard cryptographic transform that is 
defined to be used with SRTP in [RFC5669]. Three options are 
defined: one using SHA-1 authentication, one using Counter Mode 
with Cipher Block Chaining Message Authentication Code (CBC-MAC), 
and one using Galois Counter Mode. 


ARIA: A Korean block cipher [ARIA-SRTP] that supports 128-, 192-, 
and 256-bit keys. It also defines three options: Counter Mode 
where combined with HMAC-SHA-1 with 80- or 32-bit authentication 
tags, Counter Mode with CBC-MAC, and Galois Counter Mode. It also 
defines a different key derivation function than the AES-—based 
systems. 


AES-192-CM and AES-256-CM: Cryptographic transforms for SRTP based 
on AES-192 and AES-256 Counter Mode encryption and 160-bit keyed 
HMAC-SHA-1 with 80- and 32-bit authentication tags. These provide 
192- and 256-bit encryption keys, but otherwise match the default 
128-bit AES-CM transform. The transforms are defined in [RFC3711] 
and [RFC6188], and the SRTP crypto suites are defined in 
[RFC6188]. 


AES-GCM and AES-CCM: AES Galois Counter Mode and AES Counter Mode 
with CBC-MAC for AES-128 and AES-256. This authentication is 
included in the cipher text, which becomes expanded with the 
length of the authentication tag instead of using the SRTP 
authentication tag. This is defined in [AES-GCM]. 


NULL: SRTP [RFC3711] also provides a NULL cipher that can be used 
when no confidentiality for RIP/RTCP is requested. The 
corresponding SRTP protection profile is defined in [RFC5764]. 


The source authentication guarantees provided by SRIP depend on the 
cryptographic transform and key management used. Some transforms 
give strong source authentication even in multiparty sessions; others 
give weaker guarantees and can authenticate group membership but not 
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sources. Timed Efficient Stream Loss-Tolerant Authentication (TESLA) 
[RFC4383] offers a complement to the regular symmetric keyed 
authentication transforms, like HMAC-SHA-1, and can provide 
per-source authentication in some group communication scenarios. The 
downside is the need for buffering the packets for a while before 
authenticity can be verified. 


[RFC4771] defines a variant of the authentication tag that enables a 
receiver to obtain the Roll over Counter for the RTP sequence number 
that is part of the Initialization Vector (IV) for many cryptographic 
transforms. This enables quicker and easier options for joining a 
long-lived RTP group; for example, a broadcast session. 


RTP header extensions are normally carried in the clear and are only 


integrity protected in SRTP. This can be problematic in some cases, 
so [RFC6904] defines an extension to also encrypt selected header 
extensions. 


SRTP is specified and deployed in a number of RTP usage contexts; 
significant support is provided in SIP-established VoIP clients, 
including IP Multimedia Subsystems (IMS), and in the Real Time 
Streaming Protocol (RTSP) [RTSP] and RTP-based media streaming. 
Thus, SRTP in general is widely deployed. When it comes to 
cryptographic transforms, the default (AES-CM and HMAC-SHA-1) is the 
most commonly used, but it might be expected that AES-GCM, 
AES-192-CM, and AES-256-CM will gain usage in future, especially due 
to the AES- and GCM-specific instructions in new CPUs. 


SRTP does not contain an integrated key management solution; instead, 
it relies on an external key management protocol. There are several 
protocols that can be used. The following sections outline some 
popular schemes. 


3.1.1. Key Management for SRTP: DTLS-SRTP 


A Datagram Transport Layer Security (DTLS) extension exists for 
establishing SRTP keys [RFC5763] [RFC5764]. This extension provides 
secure key exchange between two peers, enabling Perfect Forward 
Secrecy (PFS) and binding strong identity verification to an 
endpoint. PFS is a property of the key agreement protocol that 
ensures that a session key derived from a set of long-term keys will 
not be compromised if one of the long-term keys is compromised in the 
future. The default key generation will generate a key that contains 
material contributed by both peers. The key exchange happens in the 
media plane directly between the peers. The common key exchange 
procedures will take two round trips assuming no losses. Transport 
Layer Security (TLS) resumption can be used when establishing 
additional media streams with the same peer, and it reduces the setup 
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time to one RTT for these streams (see [RFC5764] for a discussion of 
TLS resumption in this context). 


The actual security properties of an established SRTP session using 
DTLS will depend on the cipher suites offered and used, as well as 
the mechanism for identifying the endpoints of the handshake. For 
example, some cipher suites provide PFS, while others do not. When 
using DTLS, the application designer needs to select which cipher 
suites DTLS-SRTP can offer and accept so that the desired security 
properties are achieved. The next choice is how to verify the 
identity of the peer endpoint. One choice can be to rely on the 
certificates and use a PKI to verify them to make an identity 
assertion. However, this is not the most common way; instead, self- 
signed certificates are common to use to establish trust through 
signaling or other third-party solutions. 


DTLS-SRTP key management can use the signaling protocol in four ways: 
First, to agree on using DTLS-SRTP for media security. Second, to 
determine the network location (address and port) where each side is 
running a DTLS listener to let the parts perform the key management 
handshakes that generate the keys used by SRTP. Third, to exchange 
hashes of each side’s certificates to bind these to the signaling and 
ensure there is no MITM attack. This assumes that one can trust the 
signaling solution to be resistant to modification and not be in 
collaboration with an attacker. Finally, to provide an asserted 
identity, e.g., [RFC4474], that can be used to prevent modification 
of the signaling and the exchange of certificate hashes. That way, 
it enables binding between the key exchange and the signaling. 


This usage is well defined for SIP/SDP in [RFC5763] and, in most 
cases, can be adopted for use with other bidirectional signaling 
solutions. It is to be noted that there is work underway to revisit 
the SIP Identity mechanism [RFC4474] in the IETF STIR working group. 


The main question regarding DTLS-SRTP’s security properties is how 
one verifies any peer identity or at least prevents MITM attacks. 
This does require trust in some DTLS-SRTP external parties: either a 
PKI, a Signaling system, or some identity provider. 


DTLS-SRTP usage is clearly on the rise. It is mandatory to support 
in Web Real-Time Communication (WebRTC). It has growing support 
among SIP endpoints. DTLS-SRTP was developed in IETF primarily to 
meet security requirements for RTP-based media established using SIP. 
The requirements considered can be reviewed in "Requirements and 
Analysis of Media Security Management Protocols" [RFC5479]. 
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3.1.2. Key Management for SRTP: MIKEY 


Multimedia Internet Keying (MIKEY) [RFC3830] is a keying protocol 
that has several modes with different properties. MIKEY can be used 
in point-to-point applications using SIP and RTSP (e.g., VoIP calls) 
but is also suitable for use in broadcast and multicast applications 
and centralized group communications. 


MIKEY can establish multiple security contexts or cryptographic 


sessions with a single message. It is usable in scenarios where one 
entity generates the key and needs to distribute the key to a number 
of participants. The different modes and the resulting properties 


are highly dependent on the cryptographic method used to establish 
the session keys actually used by the security protocol, like SRTP. 


MIKEY has the following modes of operation: 


Pre-Shared Key: Uses a pre-shared secret for symmetric key crypto 
used to secure a keying message carrying the already-generated 
session key. This system is the most efficient from the 
perspective of having small messages and processing demands. The 
downside is scalability, where usually the effort for the 
provisioning of pre-shared keys is only manageable if the number 
of endpoints is small. 


Public Key Encryption: Uses a public key crypto to secure a keying 
message carrying the already-generated session key. This is more 
resource intensive but enables scalable systems. It does require 
a public key infrastructure to enable verification. 


Diffie-Hellman: Uses Diffie-Hellman key agreement to generate the 
session key, thus providing perfect forward secrecy. The downside 
is high resource consumption in bandwidth and processing during 
the MIKEY exchange. This method can’t be used to establish group 
keys as each pair of peers performing the MIKEY exchange will 
establish different keys. 


HMAC-Authenticated Diffie-Hellman: [RFC4650] defines a variant of 
the Diffie-Hellman exchange that uses a pre-shared key in a keyed 
Hashed Message Authentication Code (HMAC) to verify authenticity 
of the keying material instead of a digital signature as in the 
previous method. This method is still restricted to 
point-to-point usage. 


RSA-R: MIKEY-RSA in Reverse mode [RFC4738] is a variant of the 
public key method, which doesn’t rely on the initiator of the key 
exchange knowing the responder’s certificate. This method lets 
both the initiator and the responder specify the session keying 
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material depending on the use case. Usage of this mode requires 
one round-trip time. 


TICKET: Ticket Payload (TICKET) [RFC6043] is a MIKEY extension using 
a trusted centralized key management service (KMS). The initiator 
and responder do not share any credentials; instead, they trust a 
third party, the KMS, with which they both have or can establish 
shared credentials. 


IBAKE: Identity-Based Authenticated Key Exchange (IBAKE) [RFC6267] 
uses a KMS infrastructure but with lower demand on the KMS. It 
claims to provide both perfect forward and backwards secrecy. 


SAKKE: [RFC6509] provides Sakai-Kasahara Key Encryption (SAKKE) in 
MIKEY. It is based on Identity-based Public Key Cryptography and 
a KMS infrastructure to establish a shared secret value and 
certificateless signatures to provide source authentication. Its 
features include simplex transmission, scalability, low-latency 
call setup, and support for secure deferred delivery. 


MIKEY messages have several different transports. [RFC4567] defines 
how MIKEY messages can be embedded in general SDP for usage with the 
signaling protocols SIP, Session Announcement Protocol (SAP), and 
RTSP. There also exists a usage of MIKEY defined by the Third 
Generation Partnership Project (3GPP) that sends MIKEY messages 
directly over UDP [T3GPP.33.246] to key the receivers of Multimedia 
Broadcast and Multicast Service (MBMS) [T3GPP.26.346]. [RFC3830] 
defines the application/mikey media type, allowing MIKEY to be used 
in, e.g., email and HTTP. 


Based on the many choices, it is important to consider the properties 
needed in one’s solution and based on that evaluate which modes are 
candidates for use. More information on the applicability of the 
different MIKEY modes can be found in [RFC5197]. 


MIKEY with pre-shared keys is used by 3GPP MBMS [T3GPP.33.246], and 
IMS media security [T3GPP.33.328] specifies the use of the TICKET 
mode transported over SIP and HTTP. RTSP 2.0 [RTSP] specifies use of 
the RSA-R mode. There are some SIP endpoints that support MIKEY. 

The modes they use are unknown to the authors. 


3.1.3. Key Management for SRTP: Security Descriptions 


[RFC4568] provides a keying solution based on sending plaintext keys 
in SDP [RFC4566]. It is primarily used with SIP and the SDP Offer/ 
Answer model and is well defined in point-to-point sessions where 
each side declares its own unique key. Using security descriptions 
to establish group keys is less well defined and can have security 
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issues since it’s difficult to guarantee unique SSRCs (as needed to 
avoid a "two-time pad" attack -- see Section 9 of [RFC3711]). 


Since keys are transported in plaintext in SDP, they can easily be 
intercepted unless the SDP carrying protocol provides strong 
end-to-end confidentiality and authentication guarantees. This is 
not normally the case; instead, hop-by-hop security is provided 
between signaling nodes using TLS. This leaves the keying material 
sensitive to capture by the traversed signaling nodes. Thus, in most 
cases, the security properties of security descriptions are weak. 
The usage of security descriptions usually requires additional 
security measures; for example, the signaling nodes are trusted and 
protected by strict access control. Usage of security descriptions 
requires careful design in order to ensure that the security goals 
can be met. 


Security descriptions are the most commonly deployed keying solution 
for SIP-based endpoints, where almost all endpoints that support SRTP 
also support security descriptions. It is also used for access 
protection in IMS Media Security [T3GPP.33.328]. 


3.1.4. Key Management for SRTP: Encrypted Key Transport 


Encrypted Key Transport (EKT) [EKT] is an SRTP extension that enables 
group keying despite using a keying mechanism like DTLS-SRTP that 
doesn’t support group keys. It is designed for centralized 
conferencing, but it can also be used in sessions where endpoints 
connect to a conference bridge or a gateway and need to be 
provisioned with the keys each participant on the bridge or gateway 
uses to avoid decryption and encryption cycles. This can enable 
interworking between DTLS-SRTP and other keying systems where either 
party can set the key (e.g., interworking with security 
descriptions). 


The mechanism is based on establishing an additional EKT key, which 
everyone uses to protect their actual session key. The actual 
session key is sent in an expanded authentication tag to the other 
session participants. This key is only sent occasionally or 
periodically depending on use cases and depending on what 
requirements exist for timely delivery or notification. 


The only known deployment of EKT so far is in some Cisco video 
conferencing products. 
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3.1.5. Key Management for SRTP: ZRTP and Other Solutions 


The ZRTP [RFC6189] key management system for SRTP was proposed as an 
alternative to DTLS-SRTP. ZRTP provides best effort encryption 
independent of the signaling protocol and utilizes key continuity, 
Short Authentication Strings, or a PKI for authentication. ZRTP 
wasn’t adopted as an IETF Standards Track protocol, but was instead 
published as an Informational RFC in the IETF stream. Commercial 
implementations exist. 


Additional proprietary solutions are also known to exist. 
3.2. RTP Legacy Confidentiality 


Section 9 of the RTP standard [RFC3550] defines a Data Encryption 
Standard (DES) or 3DES-based encryption of RTP and RTCP packets. 
This mechanism is keyed using plaintext keys in SDP [RFC4566] using 
the "k=" SDP field. This method can provide confidentiality but, as 
discussed in Section 9 of [RFC3550], it has extremely weak security 
properties and is not to be used. 


3.3. IPsec 


IPsec [RFC4301] can be used in either tunnel or transport mode to 
protect RTP and RTCP packets in transit from one network interface to 
another. This can be sufficient when the network interfaces have a 
direct relation or in a secured environment where it can be 
controlled who can read the packets from those interfaces. 


The main concern with using IPsec to protect RIP traffic is that in 
most cases, using a VPN approach that terminates the security 
association at some node prior to the RTP endpoint leaves the traffic 
vulnerable to attack between the VPN termination node and the 
endpoint. Thus, usage of IPsec requires careful thought and design 
of its usage so that it meets the security goals. An important 
question is how one ensures the IPsec terminating peer and the 
ultimate destination are the same. Applications can have issues 
using existing APIs when determining if IPsec is being used or not 
and when determining who the authenticated peer entity is when IPsec 
is used. 


IPsec with RTP is more commonly used as a security solution between 
infrastructure nodes that exchange many RTP sessions and media 
streams. The establishment of a secure tunnel between such nodes 
minimizes the key management overhead. 
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3.4. RTP over TLS over TCP 


Just as RTP can be sent over TCP [RFC4571], it can also be sent over 
TLS over TCP [RFC4572], using TLS to provide point-to-point security 
services. The security properties TLS provides are confidentiality, 
integrity protection, and possible source authentication if the 
client or server certificates are verified and provide a usable 
identity. When used in multiparty scenarios using a central node for 
media distribution, the security provided is only between the central 
node and the peers, so the security properties for the whole session 
are dependent on what trust one can place in the central node. 


RTSP 1.0 [RFC2326] and 2.0 [RTSP] specify the usage of RTP over the 
same TLS/TCP connection that the RTSP messages are sent over. It 
appears that RTP over TLS/TCP is also used in some proprietary 
solutions that use TLS to bypass firewalls. 


3.5. RIP over Datagram TLS (DTLS) 


DTLS [RFC6347] is based on TLS [RFC5246] but designed to work over an 
unreliable datagram-oriented transport rather than requiring reliable 
byte stream semantics from the transport protocol. Accordingly, DTLS 
can provide point-to-point security for RTP flows analogous to that 
provided by TLS but over a datagram transport such as UDP. The two 
peers establish a DTLS association between each other, including the 
possibility to do certificate-based source authentication when 
establishing the association. All RTP and RTCP packets flowing will 
be protected by this DTLS association. 


Note that using DTLS for RTP flows is different from using DTLS-SRTP 
key management. DTLS-SRTP uses the same key management steps as 
DTLS, but uses SRTP for the per-packet security operations. Using 
DTLS for RTP flows uses the normal datagram TLS data protection, 
wrapping complete RTP packets. When using DTLS for RTP flows, the 
RTP and RTCP packets are completely encrypted with no headers in the 
clear; when using DTLS-SRTP, the RTP headers are in the clear and 
only the payload data is encrypted. 


DTLS can use similar techniques to those available for DTLS-SRTP to 
bind a signaling-side agreement to communicate to the certificates 
used by the endpoint when doing the DTLS handshake. This enables use 
without having a certificate-based trust chain to a trusted 
certificate root. 


There does not appear to be significant usage of DTLS for RTP. 
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3.6. Media Content Security/Digital Rights Management 


Mechanisms have been defined that encrypt only the media content 
operating within the RTP payload data and leaving the RTP headers and 
RTCP unaffected. There are several reasons why this might be 
appropriate, but a common rationale is to ensure that the content 
stored by RTSP streaming servers has the media content in a protected 
format that cannot be read by the streaming server (this is mostly 
done in the context of Digital Rights Management). These approaches 
then use a key management solution between the rights provider and 
the consuming client to deliver the key used to protect the content 
and do not give the media server access to the security context. 

Such methods have several security weaknesses such as the fact that 
the same key is handed out to a potentially large group of receiving 
clients, increasing the risk of a leak. 


Use of this type of solution can be of interest in environments that 
allow middleboxes to rewrite the RTP headers and select which streams 
are delivered to an endpoint (e.g., some types of centralized video 
conference systems). The advantage of encrypting and possibly 
integrity protecting the payload but not the headers is that the 
middlebox can’t eavesdrop on the media content, but it can still 
provide stream switching functionality. The downside of such a 
system is that it likely needs two levels of security: the payload- 
level solution, to provide confidentiality and source authentication, 
and a second layer with additional transport security ensuring source 
authentication and integrity of the RTP headers associated with the 
encrypted payloads. This can also result in the need to have two 
different key management systems as the entity protecting the packets 
and payloads are different with a different set of keys. 


The aspect of two tiers of security are present in ISMACryp (see 
Section 3.6.1) and the deprecated 3GPP Packet-switched Streaming 
Service solution; see Annex K of [T3GPP.26.234R8]. 


3.6.1. ISMA Encryption and Authentication 


The Internet Streaming Media Alliance (ISMA) has defined ISMA 
Encryption and Authentication 2.0 [ISMACryp2]. This specification 
defines how one encrypts and packetizes the encrypted application 
data units (ADUs) in an RTP payload using the MPEG-4 generic payload 
format [RFC3640]. The ADU types that are allowed are those that can 
be stored as elementary streams in an ISO Media File format-—based 
file. ISMACryp uses SRTP for packet-level integrity and source 
authentication from a streaming server to the receiver. 
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Key management for an ISMACryp-based system can be achieved through 
Open Mobile Alliance (OMA) Digital Rights Management 2.0 [OMADRMv2], 
for example. 


4. Securing RTP Applications 


In the following, we provide guidelines for how to choose appropriate 
security mechanisms for RTP applications. 


4.1. Application Requirements 


This section discusses a number of application requirements that need 
to be considered. An application designer choosing security 
solutions requires a good understanding of what level of security is 
needed and what behavior they strive to achieve. 


4.1.1. Confidentiality 


When it comes to confidentiality of an RTP session, there are several 
aspects to consider: 


Probability of compromise: When using encryption to provide media 
confidentiality, it is necessary to have some rough understanding 
of the security goal and how long one can expect the protected 
content to remain confidential. National or other regulations 
might provide additional requirements on a particular usage of an 
RTP. From that, one can determine which encryption algorithms are 
to be used from the set of available transforms. 


Potential for other leakage: RTP-based security in most of its forms 
simply wraps RTP and RTCP packets into cryptographic containers. 
This commonly means that the size of the original RTP payload is 
visible to observers of the protected packet flow. This can 
provide information to those observers. A well-documented case is 
the risk with variable bitrate speech codecs that produce 
different sized packets based on the speech input [RFC6562]. 
Potential threats such as these need to be considered and, if they 
are significant, then restrictions will be needed on mode choices 
in the codec, or additional padding will need to be added to make 
all packets equal size and remove the informational leakage. 


Another case is RTP header extensions. If SRTP is used, header 
extensions are normally not protected by the security mechanism 
protecting the RTP payload. If the header extension carries 


information that is considered sensitive, then the application 
needs to be modified to ensure that mechanisms used to protect 
against such information leakage are employed. 
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Who has access: When considering the confidentiality properties of a 
system, it is important to consider where the media handled in the 
clear. For example, if the system is based on an RTP mixer that 
needs the keys to decrypt the media, process it, and repacketize 
it, then is the mixer providing the security guarantees expected 
by the other parts of the system? Furthermore, it is important to 
consider who has access to the keys. The policies for the 
handling of the keys, and who can access the keys, need to be 
considered along with the confidentiality goals. 


As can be seen, the actual confidentiality level has likely more to 
do with the application’s usage of centralized nodes, and the details 
of the key management solution chosen, than with the actual choice of 
encryption algorithm (although, of course, the encryption algorithm 
needs to be chosen appropriately for the desired security level). 


4.1.2. Integrity 


Protection against modification of content by a third party, or due 
to errors in the network, is another factor to consider. The first 
aspect that one assesses is what resilience one has against 
modifications to the content. Some media types are extremely 
sensitive to network bit errors, whereas others might be able to 
tolerate some degree of data corruption. Equally important is to 
consider the sensitivity of the content, who is providing the 
integrity assertion, what is the source of the integrity tag, and 
what are the risks of modifications happening prior to that point 
where protection is applied. These issues affect what cryptographic 
algorithm is used, the length of the integrity tags, and whether the 
entire payload is protected. 


RTP applications that rely on central nodes need to consider if 
hop-by-hop integrity is acceptable or if true end-to-end integrity 
protection is needed. Is it important to be able to tell ifa 
middlebox has modified the data? There are some uses of RTP that 
require trusted middleboxes that can modify the data in a way that 
doesn’t break integrity protection as seen by the receiver, for 
example, local advertisement insertion in IPTV systems. There are 
also uses where it is essential that such in-network modification be 
detectable. RIP can support both with appropriate choices of 
security mechanisms. 


Integrity of the data is commonly closely tied to the question of 


source authentication. That is, it becomes important to know who 
makes an integrity assertion for the data. 
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4.1.3. Source Authentication 


Source authentication is about determining who sent a particular RTP 
or RTCP packet. It is normally closely tied with integrity, since a 
receiver generally also wants to ensure that the data received is 
what the source really sent, so source authentication without 
integrity is not particularly useful. Similarly, integrity 
protection without source authentication is also not particularly 
useful; a claim that a packet is unchanged that cannot itself be 
validated as from the source (or some from other known and trusted 
party) is meaningless. 


Source authentication can be asserted in several different ways: 


Base level: Using cryptographic mechanisms that give authentication 
with some type of key management provide an implicit method for 
source authentication. Assuming that the mechanism has sufficient 
strength not to be circumvented in the time frame when you would 
accept the packet as valid, it is possible to assert a source- 
authenticated statement; this message is likely from a source that 
has the cryptographic key(s) to this communication. 


What that assertion actually means is highly dependent on the 
application and how it handles the keys. If only the two peers 
have access to the keys, this can form a basis for a strong trust 
relationship that traffic is authenticated coming from one of the 
peers. However, in a multiparty scenario where security contexts 
are shared among participants, most base-level authentication 
solutions can’t even assert that this packet is from the same 
source as the previous packet. 


Binding the source and the signaling: A step up in the assertion 
that can be done in base-level systems is to tie the signaling to 
the key exchange. Here, the goal is to at least be able to assert 
that the source of the packets is the same entity with which the 
receiver established the session. How feasible this is depends on 
the properties of the key management system, the ability to tie 
the signaling to a particular source, and the degree of trust the 
receiver places on the different nodes involved. 


For example, systems where the key exchange is done using the 
signaling systems, such as security descriptions [RFC4568] enable 
a direct binding between signaling and key exchange. In such 
systems, the actual security depends on the trust one can place in 
the signaling system to correctly associate the peer’s identifier 
with the key exchange. 
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Using identifiers: If the applications have access to a system that 
can provide verifiable identifiers, then the source authentication 
can be bound to that identifier. For example, in a point-to-point 
communication, even symmetric key crypto, where the key management 
can assert that the key has only been exchanged with a particular 
identifier, can provide a strong assertion about the source of the 
traffic. SIP Identity [RFC4474] provides one example of how this 
can be done and could be used to bind DTLS-SRTP certificates used 
by an endpoint to the identity provider’s public key to 
authenticate the source of a DTLS-SRTP flow. 


Note that all levels of the system need to have matching 
capability to assert identifiers. If the signaling can assert 
that only a given entity in a multiparty session has a key, then 
the media layer might be able to provide guarantees about the 
identifier used by the media sender. However, using a signaling 
authentication mechanism built on a group key can limit the media 
layer to asserting only group membership. 


4.1.4. Identifiers and Identity 


There exist many different types of systems providing identifiers 
with different properties (e.g., SIP Identity [RFC4474]). In the 
context of RIP applications, the most important property is the 
possibility to perform source authentication and verify such 
assertions in relation to any claimed identifiers. What an 
identifier really represents can also vary but, in the context of 
communication, one of the most obvious is the identifiers 
representing the identity of the human user with which one 
communicates. However, the human user can also have additional 
identifiers in a particular role. For example, the human (Alice) can 
also be a police officer, and in some cases, an identifier for her 
role as police officer will be more relevant than one that asserts 
that she is Alice. This is common in contact with organizations, 
where it is important to prove the person’s right to represent the 
organization. Some examples of identifier/identity mechanisms that 
can be used: 


Certificate based: A certificate is used to assert the identifiers 
used to claim an identity; by having access to the private part of 
the certificate, one can perform signing to assert one’s identity. 
Any entity interested in verifying the assertion then needs the 
public part of the certificate. By having the certificate, one 
can verify the signature against the certificate. The next step 
is to determine if one trusts the certificate’s trust chain. 
Commonly, by provisioning the verifier with the public part of a 
root certificate, this enables the verifier to verify a trust 
chain from the root certificate down to the identifier in the 
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certificate. However, the trust is based on all steps in the 
certificate chain being verifiable and trusted. Thus, the 
provisioning of root certificates and the ability to revoke 
compromised certificates are aspects that will require 
infrastructure. 


Online identity providers: An online identity provider (IdP) can 
authenticate a user’s right to use an identifier and then perform 
assertions on their behalf or provision the requester with short- 
term credentials to assert the identifiers. The verifier can then 
contact the IdP to request verification of a particular 
identifier. Here, the trust is highly dependent on how much one 
trusts the IdP. The system also becomes dependent on having 
access to the relevant IdP. 


In all of the above examples, an important part of the security 
properties is related to the method for authenticating the access to 
the identity. 


-1.5. Privacy 


RTP applications need to consider what privacy goals they have. As 
RTP applications communicate directly between peers in many cases, 


the IP addresses of any communication peer will be available. The 
main privacy concern with IP addresses is related to geographical 
location and the possibility to track a user of an endpoint. The 


main way to avoid such concerns is the introduction of relay (e.g., a 
Traversal Using Relay NAT (TURN) server [RFC5766]) or centralized 
media mixers or forwarders that hide the address of a peer from any 
other peer. The security and trust placed in these relays obviously 
needs to be carefully considered. 


RTP itself can contribute to enabling a particular user to be tracked 
between communication sessions if the Canonical Name (CNAME) is 
generated according to the RTP specification in the form of 
user@host. Such RTCP CNAMEs are likely long-term stable over 
multiple sessions, allowing tracking of users. This can be desirable 
for long-term fault tracking and diagnosis, but it clearly has 
privacy implications. Instead, cryptographically random ones could 
be used as defined by "Guidelines for Choosing RTP Control Protocol 
(RTCP) CNAMES" [RFC7022]. 


If privacy goals exist, they need to be considered and the system 
designed with them in mind. In addition, certain RTP features might 
have to be configured to safeguard privacy or have requirements on 
how the implementation is done. 
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4.2. Application Structure 


When it comes to RTP security, the most appropriate solution is often 
highly dependent on the topology of the communication session. The 
signaling also impacts what information can be provided and if this 
can be instance specific or common for a group. In the end, the key 
management system will highly affect the security properties achieved 
by the application. At the same time, the communication structure of 
the application limits what key management methods are applicable. 

As different key management methods have different requirements on 
underlying infrastructure, it is important to take that aspect into 
consideration early in the design. 


4.3. Automatic Key Management 


The guidelines for Cryptographic Key Management [RFC4107] provide an 
overview of why automatic key management is important. They also 
provide a strong recommendation on using automatic key management. 
Most of the security solutions reviewed in this document provide or 
support automatic key management, at least to establish session keys. 
In some more long-term use cases, credentials might need to be 
manually deployed in certain cases. 


For SRTP, an important aspect of automatic key management is to 
ensure that two-time pads do not occur, in particular by preventing 


multiple endpoints using the same session key and SSRC. In these 
cases, automatic key management methods can have strong dependencies 
on signaling features to function correctly. If those dependencies 


can’t be fulfilled, additional constrains on usage, e.g., per- 
endpoint session keys, might be needed to avoid the issue. 


When selecting security mechanisms for an RTP application, it is 
important to consider the properties of the key management. Using 
key management that is both automatic and integrated will provide 
minimal interruption for the user and is important to ensure that 
security can, and will remain, to be on by default. 


4.4. End-to-End Security vs. Tunnels 


If the security mechanism only provides a secured tunnel, for 
example, like some common uses of IPsec (Section 3.3), it is 
important to consider the full end-to-end properties of the system. 
How does one ensure that the path from the endpoint to the local 
tunnel ingress/egress is secure and can be trusted (and similarly for 
the other end of the tunnel)? How does one handle the source 
authentication of the peer, as the security protocol identifies the 
other end of the tunnel? These are some of the issues that arise 
when one considers a tunnel-based security protocol rather than an 
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end-to-end one. Even with clear requirements and knowledge that one 
still can achieve the security properties using a tunnel-based 
solution, one ought to prefer to use end-to-end mechanisms, as they 
are much less likely to violate any assumptions made about 
deployment. These assumptions can also be difficult to automatically 
verify. 


4.5. Plaintext Keys 


Key management solutions that use plaintext keys, like SDP security 
descriptions (Section 3.1.3), require care to ensure a secure 
transport of the signaling messages that contain the plaintext keys. 
For plaintext keys, the security properties of the system depend on 
how securely the plaintext keys are protected end-to-end between the 
sender and receiver(s). Not only does one need to consider what 
transport protection is provided for the signaling message, including 
the keys, but also the degree to which any intermediaries in the 
signaling are trusted. Untrusted intermediaries can perform MITM 
attacks on the communication or can log the keys, resulting in the 
encryption being compromised significantly after the actual 
communication occurred. 


4.6. Interoperability 


Few RTP applications exist as independent applications that never 
interoperate with anything else. Rather, they enable communication 
with a potentially large number of other systems. To minimize the 
number of security mechanisms that need to be implemented, it is 
important to consider if one can use the same security mechanisms as 
other applications. This can also reduce problems with determining 
what security level is actually negotiated in a particular session. 


The desire to be interoperable can, in some cases, be in conflict 
with the security requirements of an application. To meet the 
security goals, it might be necessary to sacrifice interoperability. 
Alternatively, one can implement multiple security mechanisms; this, 
however, introduces the complication of ensuring that the user 
understands what it means to use a particular security system. In 
addition, the application can then become vulnerable to bid-down 
attacks. 


5. Examples 
In the following, we describe a number of example security solutions 
for applications using RTP services or frameworks. These examples 


are provided to illustrate the choices available. They are not 
normative recommendations for security. 
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5.1. Media Security for SIP-Established Sessions Using DTLS-SRTP 


In 2009, the IETF evaluated media security for RTP sessions 
established using point-to-point SIP sessions. A number of 
requirements were determined, and based on those, the existing 
solutions for media security and especially the keying methods were 
analyzed. The resulting requirements and analysis were published in 
[RFC5479]. Based on this analysis and working group discussion, 
DTLS-SRTP was determined to be the best solution. 


The security solution for SIP using DTLS-SRTIP is defined in 
"Framework for Establishing a Secure Real-time Transport Protocol 
(SRTP) Security Context Using Datagram Transport Layer Security 
(DTLS)" [RFC5763]. On a high level, the framework uses SIP with SDP 
offer/answer procedures to exchange the network addresses where the 
server endpoint will have a DTLS-SRTP-enabled server running. The 
SIP signaling is also used to exchange the fingerprints of the 
certificate each endpoint will use in the DTLS establishment process. 
When the signaling is sufficiently completed, the DTLS-SRTP client 
performs DTLS handshakes and establishes SRTP session keys. The 
clients also verify the fingerprints of the certificates to verify 
that no man in the middle has inserted themselves into the exchange. 


DTLS has a number of good security properties. For example, to 
enable a MITM, someone in the signaling path needs to perform an 
active action and modify both the signaling message and the DTLS 


handshake. Solutions also exist that enable the fingerprints to be 
bound to identities. SIP Identity provides an identity established 
by the first proxy for each user [RFC4474]. This reduces the number 


of nodes the connecting User Agent has to trust to include just the 
first-hop proxy rather than the full signaling path. The biggest 
security weakness of this system is its dependency on the signaling. 
SIP signaling passes multiple nodes and there is usually no message 
security deployed, only hop-by-hop transport security, if any, 
between the nodes. 


5.2. Media Security for WebRTC Sessions 


Web Real-Time Communication (WebRTC) [WebRTC] is a solution providing 
JavaScript web applications with real-time media directly between 
browsers. Media is transported using RTP and protected using a 
mandatory application of SRTP [RFC3711], with keying done using DTLS- 
SRTP [RFC5764]. The security configuration is further defined in 
"WebRTC Security Architecture" [WebRTC-SEC]. 


A hash of the peer’s certificate is provided to the JavaScript web 


application, allowing that web application to verify identity of the 
peer. There are several ways in which the certificate hashes can be 
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verified. An approach identified in the WebRTC security architecture 
[WebRTC-SEC] is to use an identity provider. In this solution, the 
identity provider, which is a third party to the web application, 
signs the DTLS-SRTP hash combined with a statement on the validity of 
the user identity that has been used to sign the hash. The receiver 
of such an identity assertion can then independently verify the user 
identity to ensure that it is the identity that the receiver intended 
to communicate with, and that the cryptographic assertion holds; this 
way, a user can be certain that the application also can’t perform a 
MITM and acquire the keys to the media communication. Other ways of 
verifying the certificate hashes exist; for example, they could be 
verified against a hash carried in some out-of-band channel (e.g., 
compare with a hash printed on a business card) or using a verbal 
short authentication string (e€.g., as in ZRTP [RFC6189]) or using 
hash continuity. 


In the development of WebRTC, there has also been attention given to 
privacy considerations. The main RTP-related concerns that have been 
raised are: 


Location disclosure: As Interactive Connectivity Establishment (ICE) 
negotiation [RFC5245] provides IP addresses and ports for the 
browser, this leaks location information in the signaling to the 
peer. To prevent this, one can block the usage of any ICE 
candidate that isn’t a relay candidate, i.e., where the IP and 
port provided belong to the service providers media traffic relay. 


Prevent tracking between sessions: Static RTP CNAMEs and DTLS-SRTP 
certificates provide information that is reused between session 
instances. Thus, to prevent tracking, such information ought not 
be reused between sessions, or the information ought not be sent 
in the clear. Note that generating new certificates each time 
prevents continuity in authentication, however, as WebRTC users 
are expected to use multiple devices to access the same 
communication service, such continuity can’t be expected anyway; 
instead, the above-described identity mechanism has to be relied 
on. 


Note: The above cases are focused on providing privacy from other 
parties, not on providing privacy from the web server that provides 
the WebRTC JavaScript application. 


5.3. IP Multimedia Subsystem (IMS) Media Security 


In IMS, the core network is controlled by a single operator or by 
several operators with high trust in each other. Except for some 
types of accesses, the operator is in full control, and no packages 
are routed over the Internet. Nodes in the core network offer 
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services such as voice mail, interworking with legacy systems (Public 
Switched Telephone Network (PSTN), Global System for Mobile 
Communications (GSM), and 3G), and transcoding. Endpoints are 
authenticated during the SIP registration using either IMS and 
Authentication and Key Agreement (AKA) (using Subscriber Identity 
Module (SIM) credentials) or SIP Digest (using a password). 


In IMS media security [T3GPP.33.328], end-to-end encryption is, 
therefore, not seen as needed or desired as it would hinder, for 
example, interworking and transcoding, making calls between 
incompatible terminals impossible. Because of this, IMS media 
security mostly uses end-to-access-edge security where SRTP is 
terminated in the first node in the core network. As the SIP 
signaling is trusted and encrypted (with TLS or IPsec), security 
descriptions [RFC4568] is considered to give good protection against 
eavesdropping over the accesses that are not already encrypted (GSM, 
3G, and Long Term Evolution (LTE)). Media source authentication is 
based on knowledge of the SRTP session key and trust in that the IMS 
network will only forward media from the correct endpoint. 


For enterprises and government agencies, which might have weaker 
trust in the IMS core network and can be assumed to have compatible 
terminals, end-to-end security can be achieved by deploying their own 
key management server. 


Work on interworking with WebRTC is currently ongoing; the security 
will still be end-to-access-edge but using DTLS-SRTP [RFC5763] 
instead of security descriptions. 


5.4. 3GPP Packet-Switched Streaming Service (PSS) 


The 3GPP Release 11 PSS specification of the Packet-switched 
Streaming Service (PSS) [T3GPP.26.234R11] defines, in Annex R, a set 


of security mechanisms. These security mechanisms are concerned with 
protecting the content from being copied, i.e., Digital Rights 
Management (DRM). To meet these goals with the specified solution, 


the client implementation and the application platform are trusted to 
protect against access and modification by an attacker. 


PSS is media controlled by RTSP 1.0 [RFC2326] streaming over RTP. 
Thus, an RTSP client whose user wants to access a protected content 
will request a session description (SDP [RFC4566]) for the protected 
content. This SDP will indicate that the media is protected by 
ISMACryp 2.0 [ISMACryp2] encoding application units (AUs). The 
key(s) used to protect the media is provided in one of two ways. If 
a single key is used, then the client uses some DRM system to 
retrieve the key as indicated in the SDP. Commonly, OMA DRM v2 
[OMADRMv2] will be used to retrieve the key. If multiple keys are to 
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be used, then an additional RTSP stream for key updates in parallel 
with the media streams is established, where key updates are sent to 
the client using Short Term Key Messages defined in the "Service and 
Content Protection for Mobile Broadcast Services" part [OMASCP] of 
the OMA Mobile Broadcast Services [OMABCAST]. 


Worth noting is that this solution doesn’t provide any integrity 
verification method for the RTP header and payload header 
information; only the encoded media AU is protected. 3GPP has not 
defined any requirement for supporting any solution that could 
provide that service. Thus, replay or insertion attacks are 
possible. Another property is that the media content can be 
protected by the ones providing the media, so that the operators of 
the RTSP server have no access to unprotected content. Instead, all 
that want to access the media are supposed to contact the DRM keying 
server, and if the device is acceptable, they will be given the key 
to decrypt the media. 


To protect the signaling, RTSP 1.0 supports the usage of TLS. This 
is, however, not explicitly discussed in the PSS specification. 

Usage of TLS can prevent both modification of the session description 
information and help maintain some privacy of what content the user 
is watching as all URLs would then be confidentiality protected. 


Deore RTSP (2.0 


The Real-time Streaming Protocol 2.0 [RTSP] offers an interesting 
comparison to the PSS service (Section 5.4) that is based on RTSP 1.0 
and service requirements perceived by mobile operators. A major 
difference between RTSP 1.0 and RTSP 2.0 is that 2.0 is fully defined 
under the requirement to have a mandatory-to-implement security 
mechanism. As it specifies one transport media over RTP, it is also 
defining security mechanisms for the RTP-transported media streams. 


The security goal for RTP in RTSP 2.0 is to ensure that there is 
confidentiality, integrity, and source authentication between the 
RTSP server and the client. This to prevent eavesdropping on what 
the user is watching for privacy reasons and to prevent replay or 
injection attacks on the media stream. To reach these goals, the 
signaling also has to be protected, requiring the use of TLS between 
the client and server. 


Using TLS-protected signaling, the client and server agree on the 
media transport method when doing the SETUP request and response. 
The secured media transport is SRTP (SAVP/RTP) normally over UDP. 
The key management for SRTP is MIKEY using RSA-R mode. The RSA-R 
mode is selected as it allows the RTSP server to select the key 
despite having the RTSP client initiate the MIKEY exchange. It also 
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enables the reuse of the RTSP server’s TLS certificate when creating 
the MIKEY messages, thus ensuring a binding between the RTSP server 
and the key exchange. Assuming the SETUP process works, this will 
establish a SRTP crypto context to be used between the RTSP server 
and the client for the RTP-transported media streams. 


6. Security Considerations 
This entire document is about security. Please read it. 
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